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SEMIPARAMETRICALLY EFFICIENT ESTIMATION OF 
CONSTRAINED EUCLIDEAN PARAMETERS 

By Chris A.J. Klaassen* and Nanang Susyanto* 
University of Amsterdam 

Consider a quite arbitrary (semi) parametric model with a Eu¬ 
clidean parameter of interest and assume that an asymptotically 
(semi)parametrically efficient estimator of it is given. If the parameter 
of interest is known to lie on a general surface (image of a continuously 
differentiable vector valued function), we have a submodel in which 
this constrained Euclidean parameter may be rewritten in terms of 
a lower-dimensional Euclidean parameter of interest. An estimator 
of this underlying parameter is constructed based on the original es¬ 
timator, and it is shown to be (semi)parametrically efficient. It is 
proved that the efficient score function for the underlying parameter 
is determined by the efficient score function for the original parameter 
and the Jacobian of the function defining the general surface, via a 
chain rule for score functions. Efficient estimation of the constrained 
Euclidean parameter itself is considered as well. 

Our general estimation method is applied to location-scale, Gaus¬ 
sian copula and semiparametric regression models, and to parametric 
models under linear restrictions. 


1. Introduction. Let X\,..., X n be i.i.d. copies of X taking values 
in the measurable space (A, A) in a semiparametric model with Euclidean 
parameter 0 € 0 where 0 is an open subset of M. k . We denote this semipara¬ 
metric model by 

( 1 . 1 ) V = {Pe, G : £€ 0 , GeG}. 

Typically, the nuisance parameter space Q is a subset of a Banach or Hilbert 
space. This space may also be finite dimensional, thus resulting in a para¬ 
metric model. 

We assume an asymptotically efficient estimator 6 n = Q n (X i,..., X n ) is 
given of the parameter of interest 6, which under regularity conditions means 
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that 

( 1 . 2 ) ^P 0 ,a 0 

holds. Here £(■; 9, G,V) is the efficient influence function at Pe,G for estima¬ 
tion of 9 within V and 

(1-3) = (Jj(x;0,G,V)F(x;e,G,P)dP o , G (x)j £(-;d,G,P) 

is the corresponding efficient score function at Pqg f° r estimation of 9 within 

V. 

The topic of this paper is asymptotically efficient estimation when it is 
known that 9 lies on a general surface, or equivalently, when it is known 
that 9 is determined by a lower dimensional parameter via a continuously 
differentiable function, which we denote by 


(1.4) 

Here f:N C 

(1.5) 


9 = f(u), veN. 

with d < k is known, N is open, the Jacobian 

dfiiy) 


/(") = 


dui 


j = l,:.,d 


of / is assumed to be of full rank on N, and v is the unknown d-dimensional 
parameter to be estimated. Thus, we focus on the (semi)parametric model 


( 1 . 6 ) 


Q = {P :i 


f(y),G 


e N, G e g] c V. 


The first main result of this paper is that a semiparametrically efficient 
estimator of v, the parameter of interest, has to be asymptotically linear 
with efficient score function for estimation of v equal to 

(1.7) i(-,v,G,Q)=f T (v)i(-,6,G,V). 


Such a semiparametrically efficient estimator of the parameter of interest can 
be defined in terms of /(•) and the efficient estimator 9 n of 9; see equation 
(4.1) in Section 4. This is our second main result. How (1.7) is related 
to the chain rule for differentiation will be explained in Section 2, which 
proves this chain rule for score functions. The semiparametric lower bound 
for estimators of v is obtained via the Hajek-LeCam Convolution Theorem 
for regular parametric models and without projection techniques in Section 



CONSTRAINED EUCLIDEAN PARAMETERS 


3 


3. In Section 4 efficient estimators within Q of v and 9 are constructed, as 
well as efficient estimators of 9 under linear restrictions on 6. The generality 
of our approach facilitates the analysis of numerous statistical models. We 
discuss some of such parametric and semiparametric models and related 
literature in Section 5. One of the proofs will be given in Appendix A. 

The topic of this paper should not be confused with estimation of the 
parameter 9 when it is known to lie in a subset of the original parameter 
space described by linear inequalities. A comprehensive treatment of such 
estimation problems may be found in Van Eeden (2006). Our model Q with 
its constrained Euclidean parameters also differs from the constraint defined 
models as studied by Bickel et al. (1993, 1998) (henceforth called BKRW), 
which are defined by restrictions on the distributions in V. 

2. The Chain Rule for Score Functions. The basic building block 
for the asymptotic theory of semiparametric models as presented in e.g. 
BKRW (1993) is the concept of regular parametric model. Let Vq = {Pq : 9 £ 0} 
with O C open be a parametric model with all Pq dominated by a cr- 
finite measure p on (A, A) . Denote the density of Pq with respect to p by 
p(9) = p(-\9,Vo) and the Z^AO-norm by || • || M . If for each 9q € 0 there 
exists a /c-dimensional column vector £(9q,Vq) of elements of L 2 (Pq 0 ), the 
so-called score function, such that the Frechet differentiability 

II VW) - VWo) ~\(0- 9 0 f l(9 0 ,Ve)VR9o) || M 

(2.1) = o{\9 — #o|), 9 —^ # 0 ) 

holds and the k x k Fisher information matrix 

(2.2) I(9 0 )= [ i(9 0 ,Ve)i T (9 0 ,V e )dPe 0 

J x 

is nonsingular, and, moreover, the map 9 H > £(9, Po)^p(9) from 0 to L^p) 
is continuous, then Vq is called a regular parametric model. Often the score 
function may be determined by computing the logarithmic derivative of the 
density with respect to 9 ; cf. Proposition 2.1.1 of BKRW (1993). We will 
call V from (1.1) a regular semiparametric model if for all G € Q 

(2.3) P e , G = {Pq : g : 9 e 0} 
is a regular parametric model. 

Fix 9q € 0 and Go S G, and write Pe 0 ,G 0 = Pq- Let ij; : @ Q with 
i/j(6 0 ) = Gq be such that 


(2.4) 


^ — { p e,i>(0) '■ & £ ©} 
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is a regular parametric submodel of V with score function £(9o,V^p) at 6q 
and Fisher information matrix Kfio^V^f), say. Let the density of Pq^io) with 
respect to p, be denoted by q{9). Since is a regular parametric model the 
score function £(9o,'P t p) for 9 at 9q within V^ satisfies (cf. (2.1)) 

II V^-Vo(0o)-h( d -9o) T m,V^^M0o) IU 

(2.5) = o(|0-0 o |), 9^9 0 . 

Considering now the (semi)parametric submodel Q from(1.6) we fix vq 
and write /(z^o) = 9 o and f(v) = 9. Within Q the Frechet differentiability 

(2.5) yields 

II Vq(fR) - y/qtfM) ~ \ (/O') ~ /W) r ^/W,^)v / ?(/M IU 

(2.6) = o(\f(v) - f(vo)\), /W-i/K), 

and hence 

II vWM) - V«(/N)- - vo) T f T {voy(9 0 ,V^)^q{f(v 0 )) || M 

(2.7) =o( \V-V Q \), V 1 1^0; 

in view of the differentiability of /(•)• Since /(•) is continuous, this means 
that 

(2-8) Qf = { p f(y),i>Vly)) ■ v € N) 

is a regular parametric submodel of Q with score function 

(2.9) %,Q^) = /^oW,^) 
for v at Po and Fisher information matrix 

(2.10) f T (vo)I(9o,P^)f(vo) = Rv o) [ Ro^F^V^dPo f(v 0 ). 

J X 

We have proved 

Proposition 2.1. LetV as in (1.1) be a regular semiparametric model 
and let Q as in (1.6) be a regular semiparametric submodel with /(•) and /(•) 
defined as in and below (1-4) an d (1-5). If there exists a regular parametric 
submodel ofP with score function £(9o,Vf) for 9 at9o = /( z/q), then there 
exists a regular parametric submodel Q^ of Q with score function £{vq, Q^) 
for v at vo satisfying (2.9). 
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This Proposition is also valid for parametric models, as may be seen by 
choosing Q finite dimensional or even degenerate. The basic version of the 
chain rule for score functions is for such a parametric model P©. We have 
chosen the more elaborate formulation of Proposition 2.1 since we are going 
to apply the chain rule for such parametric submodels V^ of semiparametric 
models V. 

3. Convolution Theorem and Main Result. An estimator 9 n of 
9 within the regular semiparametric model V is called (locally) regular at 
Po = PQo,Go if it is (locally) regular at Po within for all regular para¬ 
metric submodels of V containing P©^. According to the Hajek-LeCam 
Convolution Theorem for regular parametric models (see e.g. Section 2.3 of 
BKRW (1993)) this implies that such a regular estimator 9 n of 9 within V 
has a limit distribution under Po that is the convolution of a normal dis¬ 
tribution with mean 0 and covariance matrix I^ 1 {9q,V^) and another dis¬ 
tribution, for any regular parametric submodel containing Po- If there 
exists ip = ■i/>o such that this last distribution is degenerate at 0, we call 9 n 
(locally) efficient at Po and V^ 0 a least favorable parametric submodel for 
estimation of 9 within V at Po- Then the Hajek-LeCam Convolution The¬ 
orem also implies that 9 n is asymptotically linear in the efficient influence 
function l(9o, Gq,V) = £(■; 9q, Gq, V) satisfying 

(3.1) m,G 0 ,p) = = i~\o o,r^)£{eo,v^), 

which means 

(3.2) ^ (y0 n - 0 O - ^ ta; 00, Go, P)^J ->p 0 0. 

The argument above can be extended to the more general situation that 
there exists a least favorable sequence of parametric submodels indexed by 
i/jj ,j = 1, 2,... , such that the corresponding score functions ) for 9 

at 6o within model V^ j converge in L^Po) to £(6o, Go, V) = £(•; 6q, Go, P), 
say. A regular estimator 9 n of 9 within V is called efficient then, if it is asymp¬ 
totically linear as in (3.2) with efficient influence function £(9q,Go,V) = 
l(-;9 0 ,G 0 ,V) satisfying 

m,G 0 ,V) = (Jj(9 0 ,Go,V)i T (9o,Go,V)dP 0 ^ l(9 0 ,G 0 ,V) 

= I- 1 (9 0 ,G 0 ,V)i(9 0 ,G 0 ,V). 


(3.3) 
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Indeed, by the Convolution Theorem for regular parametric models the 
convergence 


(3.4) 


n[e n -e o4E^;9o,^) 

i= 1 


1 

s/n 


0 ,v fj ) 

i= 1 


Po 


holds with the fc-vectors Rj and Zj independent and Zj normal with mean 
0 and covariance matrix I~ 1 (9 Taking limits as j —> oo we see by 
tightness arguments and by the convergence of £(9o,V r i, j ) to £(9o,Go,V) in 
L^Pq), that also 


(3.5) 


/ ^ ~ 
nK-0o-££*(*i;0o,Go,‘P) 


i=1 


j=J2£(x i; e 0 ,Co,v) 


y/n 


Po 


Rp 

Zp 


i= 1 


holds with R'p and Zp independent. If Rp is degenerate at 0, then 9 n is 
locally asymptotically efficient at Po within V and the sequence of regular 
parametric submodels Vp :i is least favorable indeed. 

Now, let us assume such a least favorable sequence and efficient estimator 
9 n exist at Po = Pe 0 ,G 0 with 9 o = f(v o) and /(•) from (1.4) and (1.5) contin¬ 
uously differentiable. By the chain rule for score functions from Proposition 
2.1 the score function £(uq, Q^ .) for v at vq within Q^. satisfies 


(3.6) ^o,Q^) = / T Mi(0o,Pv> J ) 


and hence the corresponding influence function £(uo, Qipj) satisfies 

(3.7) £(v 0 , Q^) = {f T (u 0 )I(9 0 ,V^)f^o)y i f T (v oWo,^). 

Let On be a locally regular estimator of v at Po within the regular semipara- 
metric model Q. By the convergence of £(9<j : ) to £{9q,Gq,V) in L^Po), 

the influence functions from (3.7) converge in L^Pq) to 

(3.8) £(u 0 , Go, Q) = (/ T (^o)/(0o,Go,P)/(^o)) _1 f T (u 0 )£(9 0 ,G 0 ,V) 


and the argument leading to (3.5) yields the convergence 


/ 

V 


Vn 



-^-^t^X i;V0 ,G 0 ,Q))\ 

i =1 / 

n 

^E£(X^o,Go,Q) J 


-^Po 



(3.9) 
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with Rq and Zq independent. Note that Zq has a normal distribution with 
mean 0 and covariance matrix 

(3.10) rV 0l Go,Q) = (/> 0 )/(0 o ,G'o,iP)/M) _1 - 

Under an additional condition on /(•) we shall construct an estimator z> n of 
v based on 6 n for which Rq is degenerate. This construction of 0 n will be 
given in the next section together with a proof of its efficiency, and this will 
complete the proof of our main result formulated as follows. 

Theorem 3.1. Let V from (1.1) be a regular semiparametric model with 
Pq = P() 0 ,G 0 £ P,&o = /(i/ 0 ), and /(•) from (1-4) and (1.5) continuously 
differentiable. Furthermore, let /(•) have an inverse on f(N) that is differ¬ 
entiable with a bounded Jacobian. If there exists a least favorable sequence of 
regular parametric submodels V^. and an asymptotically efficient estimator 
6 n of 6 satisfying (3.5) with R'p = 0 a.s., then there exists a least favorable 
sequence of regular parametric submodels Q^. of the restricted model Q from 
(1.6) and an asymptotically efficient estimator v n of v satisfying (3.9) with 
Rq = 0 a.s. and attaining the asymptotic information bound (3.10). 

Note that the convolution result (3.9) and (3.8) also holds if the convergent 
sequence of regular parametric submodels V^ j is not least favorable, and 
that it implies by the central limit theorem that the limit distribution of 
y/n (u n — vq) is the convolution of a normal distribution with mean 0 and 
covariance matrix 

(3.11) r\v 0 ,G 0 ,Q) = (/^om^o^/M)” 1 
and the distribution of Rq . 

4. Efficient Estimator of the Parameter of Interest. There are 
many ways of constructing efficient estimators in (semi)parametric models. 
One of the common approaches is upgrading a -^/n-consistent estimator as 
in Sections 2.5 and 7.8 of BKRW (1993). A somewhat different upgrading 
approach is used in the following construction. 


Theorem 4.1. Consider the situation of Theorem 3.1. If the symmetric 
positive definite kx k-matrix I n is a consistent estimator of 1(0, G,V) within 
V and v n is a y/n- consistent estimator of v within Q, then 


(4.1) 




(/ T (t'n)4/(t'n)) f T (Vn)In 0 n - f {v, 


is efficient, i.e., it satisfies (3.9) with Rq = 0 a.s. 
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Proof The continuity of /(•) and the consistency of is n and I n imply that 

(4.2) K n = (/ T (i/ n ) J n /(i/ n )) f T {Vn)In 
converges in probability under To to 

(4.3) K 0 = f T (v 0 )I(e 0 ,G 0 ,V). 

This means that K n consistently estimates K$. In view of (4.1), (3.8), (3.3), 
and (3.5) with R-p = 0 we obtain 


n \ is n - is o 


1 n 

-V£(X i; i/ 0 ,Go,Q) 

n z ' 


2—1 


= \k~i v n -is 0 + k n \§ n - f (p n )] - - V K 0 £{Xi-, do, Go, V) 
\ L J n 


2—1 


= \fn[v n -v o - K n [f ( is n ) - 


(4-4) 


+ 


Kn - Ko 


1 

— Y J k x uOo,Go,V) + 


°p (i)■ 


By the consistency of K n the second term at the right hand side of (4.4) 
converges to 0 in probability under Pq in view of the central limit theorem. 
Because / (P n ) = f(is 0 ) + f(is 0 ) ( i> n - is 0 ) + o p ( is n - is 0 ) holds and K 0 f(is 0 ) 
equals the dx d identity matrix, the first part of the right hand side of (4.4) 
also converges to 0 in probability under Pq. □ 


To complete the proof of Theorem 3.1 with the help of Theorem 4.1 we will 
construct a -^/n-consistent estimator is n of is and subsequently a consistent 
estimator I n of I{6,G,V). Let || • || be a Euclidean norm on R^’. We choose 
is n in such a way that 

(4.5) || / ( u n ) - 9 n || < inf || f(is) - 6 n || +- 

u&N n 

holds. Of course, if the infimum is attained, we choose is n as the minimizer. 
By the triangle inequality and the -y/n-consistency of 6 n we obtain 

II / (pn) - f{v o) ||< inf II f{is) - 9 n || +-+ || f(is 0 ) - On || 
vgn n 

< 2 || 0 n — f(is 0 ) || 4— = O p ( 
n \ 


(4.6) 


1 
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The assumption from Theorem 3.1 that /(•) has an inverse on f(N) that 
is differentiable with a bounded Jacobian, suffices to conclude that (4.6) 
guarantees y^n-consistency of u n . 

In constructing a consistent estimator of the Fisher information matrix 
based on the given efficient estimator 0 n , we split the sample in blocks as 
follows. Let (k n ),(£ n ), and (m n ) be sequences of integers such that k n = 
£ n m n ,k n /n -A k, 0 < n < 1, and £ n -A oo ,m n -A oo hold as n -A oo. For 
j = 1,... , £ n let 6 n j be the efficient estimator of 0 based on the observations 
X(j_])m n +i■ ■ ■ ■ iXjmn and 0 n ,o be the efficient estimator of 9 based on the 
remaining observations Xk n +i, ■ ■ ■ , X n . Consider the ’’empirical” character¬ 
istic function 

1 tn 

(4.7) 4> n (t) = — ^ exp (j) n> j - 0 n ,o) j , t € M fc , 

n j =i 

which we rewrite as 


£-71 

(j> n (t) = exp j- ity/rn.n (o n ,o - 0 °) } f exp 


9 n ,j 


3 = i 


(4.8) 


= exp 


(o nfi - 0 O ) } 4>n(t)- 


In view of m n /(n — k n ) -A 0 and (3.5) with Rp = 0 a.s. we see that the 
first factor at the right hand side of (4.8) converges to 1 as n -A oo. The 
efficiency of 9 n in (3.5) with Rrp = 0 a.s. also implies 


(4.9) 


E (^n(i)) = E ( ex P (fin, 1 - 00) }) 
-A E (exp {itZp}) 


as n -A oo, with Zp normally distributed with mean 0 and covariance matrix 
I 1 (0q , Gq , 'R ). Some computation shows 


(4.10) 




1 



It follows by Chebyshev’s inequality that 4> n (t) and hence cf> n (t) converges 
under Po = E e 0 ,G 0 to the characteristic function of Zp at t, 

(4.11) (f> n (t)^-p 0 E (exp {itZp}) = exp{-it r / _1 (0 o ,G o ,P)t} . 
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For every t € we obtain 

(4.12) - 2 log ($R (^n(t))) -Ap 0 t T r\0o,Go,V)t. 

Choosing A:(A: + l)/2 appropriate values of t we may obtain from (4.12) 
an estimator of I~^(9o, Go, V) and hence of I (do, Go,V). Indeed, with t 
equal to the unit vectors tq we obtain estimators of the diagonal elements 
of I~ 1 (9o, Gq, V) and an estimator of its (i,j) element is obtained via 

log (jR ( '<fin(Ui ))) + log ($R (j> n (Uj)^ - log (jR ($ n (Ui + . 

When needed, the resulting estimator of I(9o,Go,V) can be made positive 
definite by changing appropriate components of it by an asymptotically neg¬ 
ligible amount, while the symmetry is maintained. 

Under a mild uniform integrability condition it has been shown by Klaassen 
(1987), that existence of an efficient estimator 9 n of 9 in V implies the ex¬ 
istence of a consistent and -y/ra-unbiased estimator of the efficient influence 
function l(-;9,G,V). Basing this estimator on one half of the sample and 
taking the average of this estimated efficient influence function at the obser¬ 
vations from the other half of the sample, we could have constructed another 
estimator of the efficient Fisher information. However, this estimator would 
have been more involved, and, moreover, it needs this extra uniformity con¬ 
dition. 

With the help of Theorem 4.1, the estimator v n of v from (4.5), and the 
construction via (4.12) of an estimator I n of the efficient Fisher information 
we have completed our construction of an efficient estimator v n as in (4.1) 
of v. This estimator can be turned into an efficient estimator of 9 = f(v) 
within the model Q from (1.6) by 

(4.13) 9 n = f(u n ) 
with efficient influence function 

t(0o,G o ,Q) = f(vo)e(vo,Go,Q) 

(4.14) = f( VQ ) (f T (vo)I(9o,G 0 ,V)f(vo)y L f T (vo)£(9o, Go, V) 
and asymptotic information bound 

(4.15) I- 1 (d 0 , G 0 , Q ) = f(v 0 ) (/ T M/(0 O , Go,V)f(uo)) f T (v o)- 

Indeed, according to BKRW (1993) Section 2.3, 9 n is efficient for estimation 
of 9 under the additional information 9 = f(v). 
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Remark 4.1. If /(•) is a linear function, i.e., 9 = Lu + a holds with the 
k x d-matrix L of maximum rank d, then 

(4.16) v n = (L T L)- 1 L T (§ n — a) 


attains the infimum at the right hand side of (4.5). So, the estimator (4.1) 
becomes 


(4.17) 




L T I n L 


l t L 



with efficient influence function (3.8) and asymptotic information bound 
(3.10) with f(i'o) = L) and the estimator from (4.13) 


(4.18) 


e n = L 


L T i n L 


-l 


l t L 


0 n OL 


+ Oi. 


Note that 9 n is the projection of 9 n on the flat {9 € R fc : 9 = Lu+a, u G R d } 
under the inner product determined by I n (cf. Appendix A) and that the 
covariance matrix of its limit distribution equals the asymptotic information 
bound 

(4.19) rHSo^Go^Q) = L(L t I( 9 0 ,G 0 ,V)L)~ 1 L t . 

Another way to describe this submodel Q with 9 = Lu + a is by linear 
restrictions 


(4.20) Q = {P Lu+a : v £ N,G £ Q} = {P d)G ■ R T 9 = M € ©,G € G] , 

where R T a = ft holds and the k x d-matrix L and the k x (k — d)-matrix 
R are matching such that the columns of L are orthogonal to those of R 
and the k x fc-matrix (L R) is of rank k. Note that the open subset N of 
determines the open subset 0 of and vice versa. See Cobb and Douglas 
(1928), Stone (1954), Nyquist (1991), and Kim and Taylor (1995) for some 
examples of estimation under linear restrictions. 

In terms of the restrictions described by R and /3 the efficient estimator 
9 n of 9 from(4.18) within the submodel Q can be rewritten as 

(4.21) 9 n = 9 n - l~ x R (R T 9 n - p) , 

with asymptotic information bound 

(4.22) L(L T IL)- l L T = r 1 - r 1 R(R T r 1 R)~ 1 R T r 1 , I = I{9 0 , Go, P), 
as will be proved in Appendix A. 
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5. Examples. In this section we present five examples, which illustrate 
our construction of (semi)parametrically efficient estimators. We shall dis¬ 
cuss location-scale, Gaussian copula, and semiparametric regression models, 
and parametric models under linear restrictions. 

Example 5.1. Coefficient of variation known 

Let g(-) be an absolutely continuous density on (M, B) with mean 0, vari¬ 
ance 1, and derivative g'(-), such that J[1 + x 2 ](g’ / g(x)) 2 g(x)dx is finite. 
Consider the location-scale family corresponding to g(-). Let there be given 
efficient estimators g n and a n of g and a, respectively, based on X],..., X n , 
which are i.i.d. with density cr~ l g((- — g)/cr). By Iij we denote the element 
in the zthe row and jth column of the matrix / = a 2 I(9,G,V), where the 
Fisher information matrix 1(9, G,V) is as defined in (3.3) with 9 = (g,a) T 
and Q = (g(-)}. Some computation shows In = f (g'/g) 2 g, I 12 = I 21 = 
f x(g' /g(x)) 2 g(x)dx, and I 22 = f\xg' / g(x) + 1 ] 2 g(x)dx exist and are finite; 
cf. Section 1.2.3 of Hajek and Sidak (1967). 

We consider the submodel with the coefficient of variation known to be 
equal to a given constant c = a/g and with v = g the parameter of interest. 
Since in a parametric model the model itself is always least favorable, the 
conditions of Theorem 4.1 are satisfied and the estimator u n = g n of g from 

(4.1) with v n = g n , 9 n = (g n ,d n ) T , and I n = d~ 2 I is efficient and some 
computation shows 

(5.1) g n = (In + 2cI\2 + c 2 122 ) [(hi + cln) g n + (1 12 + cl 22 ) dn\ ■ 

In case the density g(-) is symmetric around 0, the Fisher information matrix 
is diagonal and g n from (5.1) becomes 

(5.2) g n = (in + c 2 122 ) [hign + cl 22 ^n\ ■ 

In the normal case with g(-) the standard normal density g n reduces to 

(5.3) g n = (1 + c 2 ) -1 [g n + 2 cd n \ 

with g n and d n equal to e.g. the sample mean and the sample standard 
deviation, respectively; cf. Khan (1968), Gleser and Healy (1976), and Khan 
(2015). 

Example 5.2. Gaussian copula models 
Let 

Xx = (X u ,..., X 1>m ) T ,... , X n = (X n>1 ,..., X n , m ) T 
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be i.i.d. copies of X = {X \,..., X m ) T . For i = 1,..., m, the marginal distri¬ 
bution function of X{ is continuous and will be denoted by F). It is assumed 
that (<I >_1 (Fi(Xi)),..., d> _1 (F m (X m ))) T has an m-dimensional normal dis¬ 
tribution with mean 0 and positive definite correlation matrix C(9), where 
$ denotes the one-dimensional standard normal distribution function. Here 
the parameter of interest 9 is the vector in that summarizes all 

correlation coefficients p rs , 1 < r < s < m. We will set this general Gaussian 
copula model as our semiparametric starting model V, i.e., 

(5.4) V = {Pe, G : 9 = (p 12 ,...,p {m _ 1)m ) T ,G = (F^-),...,F m (-)) e G}. 

The unknown continuous marginal distributions are the nuisance parameters 
collected as G € G ■ 

Theorem 3.1 of Klaassen and Wellner (1997) shows that the normal scores 
rank correlation coefficient is semiparametrically efficient in V for the 2- 
dimensional case with normal marginals with unknown variances constitut¬ 
ing a least favorable parametric submodel. As Hoff et al. (2014) explain at 
the end of their Section 1 and in their Section 4, their Theorem 4.1 proves 
that normal marginals with unknown, possibly unequal variances constitute 
a least favorable parametric submodel, also for the general m-dimensional 
case. Since the maximum likelihood estimators are efficient for the param¬ 
eters of a multivariate normal distribution, the sample correlation coeffi¬ 
cients are efficient for estimation of the correlation coefficients based on 
multivariate normal observations. But each sample correlation coefficient 
and hence its efficient influence function involve only two components of the 
multivariate normal observations. Apparently, the other components of the 
multivariate normal observations carry no information about the value of 
the respective correlation coefficient. Effectively, for each correlation coeffi¬ 
cient we are in the 2-dimensional case and invoking again Theorem 3.1 of 
Klaassen and Wellner (1997) we see that also in the general m-dimensional 
case the normal scores rank correlation coefficients are semiparametrically 
efficient. They are defined as 



with and being the marginal empirical distributions of F r and 
F s . respectively, 1 < r < s < m. The Van der Waerden or normal scores 
rank correlation coefficient pGs from (5.5) is a semiparametrically efficient 
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estimator of p rs with efficient influence function 
(5.6) l Prs (X r ,X s ) = 4-" 1 (F r (X r )) &- 1 (F a (X s )) 

- \p rs { [T" 1 (F r (X r ))] 2 + [T" 1 (E S (X S ))] 2 } . 

This means that 

( 5 - 7 ) On = (Pn ,■■■, p[2- i) m ) T 

efficiently estimates 6 with efficient influence function 

(5.8) i(X-,e,G,V) = (£ Pl2 (X 1 ,X 2 ),...,£ P(m _ 1) jX m _ 1 ,X m )) T . 

Subexample 5.2.1. Exchangeable Gaussian copula 

The exchangeable m-variate Gaussian copula model 

(5.9) Q = {Pi kP} G '■ P ^ (—l/(m — 1), 1), G € Q} C V 

is a submodel of the Gaussian copula model V with a one-dimensional pa¬ 
rameter of interest v = p. In this submodel all correlation coefficients have 
the same value p. So, 6 = l^p with 1*. indicating the vector of ones of di¬ 
mension k = m(m — l)/2. In order to construct an efficient estimator of 
p within Q along the fines of Section 4, in particular Remark 4.1, we first 
apply (4.16) with a = 0 and L = 1& to obtain the (natural) -^/n-consistent 
estimator 

1 m— 1 m 

(5-10) p n = D n = - Y Prs- 

r= 1 s=r-\-l 

For 6 = 1 kP we get by simple but tedious calculations (see the Supplemen¬ 
tary Material) 


(5.11) 


FJ f 

^^prs^ptu 


'(1-p 2 ) 2 if |{r, s} n {t, u}| = 2, 

< |(! - p) 2 p( 2 + 3p) if |{r,s} G {t,u}\ = 1, 
,2(1 ~p) 2 p 2 if |{r,s}n{f,u}| = 0. 


It makes sense to estimate 7 ( 1 *,, G, V) by substituting p n for p in (5.11), to 
compute the inverse of the resulting matrix, and to choose this matrix as the 
estimator I n . To this end, we note that for every pair {r, s}, 1 < r / s < m, 
there are 2(m — 2) pairs of {i, u}’s having one element in common and 
there are ^(m — 2 ){m — 3) pairs of {f,u}’s having no elements in common. 
Hence, the sum of the components of each column vector of I~ 1 (l^p,G,V) 
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is (1 — p) 2 ( 1 + (rn — 1 )p) 2 . Each matrix with the components of each column 
vector adding to 1 has the property that the sum of all row vectors equals 
the vector with all components equal to 1, and hence the components of 
each column vector of its inverse also add up to 1. This implies 


life 4 = (1 ~ Pn) 2 (1 + (jn - l)p n y x k 
and hence by (4.17) 

— 1 m— 1 m 


\-2 T 
J-i 


(5.12) p n = l k I n l 


-1 


1 lUn = yl IL = 


k~ 


E E Prf=Pn 


r =1 s=r -\-1 


attains the asymptotic information bound (cf. (3.10)) 

(5.13) (ljl (l k p,G,V)l k ) 1 = (1 - p) 2 (l + (m - l)p) 2 . 


Hoff et al. (2014) proved the efficiency of the pseudo-likelihood estimator for 
p in dimension m = 4. Segers et al. (2014) extended this result to general 
m and presented the efficient lower bounds for m = 3 and m = 4 in their 
Example 5.3. However, their maximum pseudo-likelihood estimator is not 
as explicit as our (5.12). 


Subexample 5.2.2. Four-dimensional circular Gaussian copula 

A particular, one-dimensional parameter type of four-dimensional cir¬ 
cular Gaussian copula model has been studied by Hoff et al. (2014) and 
Segers et al. (2014). It is defined by its correlation matrix 


(5.14) 


/I 

p 

P 2 

p\ 

p 

1 

p 

p 2 

P 2 

p 

1 

p 

\p 

p 2 

p 

1/ 


Our semiparametric starting model V is the same as in (5.4) with m = 4, 
but with the components of 6 rearranged as follows 


0 = (pi2 , PlA , P23 , P34 , P13 , P2a) T ■ 

Now, with f(p) = (p , p , p , p , p 2 , p 2 ) T the present circular Gaussian 
submodel Q may be written as 


Q — {Pf(p),G '■ P ^ 5>1)> GeG}. 
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In order to construct an efficient estimator of p within Q along the lines of 
Theorem 4.1, we propose as a \/n-consistent estimator of p 


Pn = §Pn,l + 5 sign (pn,l) p n>2 , 

(5.15) p U} i = \ (p$ + + p%j + Pm) > Pn,2 = \ (v® + 


As in (5.11) we get by simple but tedious calculations (see the Supplementary 
Material) 


(5.16) 

/ 


2\ 2 


r 1 (f(p) 1 G,v) = ±{i- P 2 ) 


2 

P 2 

P 2 

2 p 2 


P 

2 

2 p 2 

P 2 


P 

2 P 2 
2 

P 2 


2 p 2 p (2 + p 2 ) p (2 + p 2 ) \ 

p 2 p (2 + p 2 ) p (2 + p 2 ) 

p 2 p (2 + p 2 ) p (2 + p 2 ) 

2 P (2 + p 2 ) p (2 + p 2 ) 


p (2 + p 2 ) p (2 + p 2 ) p (2 + p 2 ) p (2 + p 2 ) 2 (l + p 2 ) 2 

\p (2 + p 2 ) p(2 + p 2 ) P (2 + p 2 ) p (2 + p 2 ) 4p 2 2 


4p 2 


2 \ 2 


which has inverse 


(5.17) 

/(/(p),G,iP) = i(l 

-P 2 )- 4 



/ P 4 + 2 

3 P 2 

3 P 2 

p 4 + 2 p 2 

- (P 3 + 2p) 

-(p 3 + 2p)\ 

3 P 2 

p 4 + 2 

p 4 + 2 p 2 

3 P 2 

- (P 3 + 2p) 

- (p 3 + 2p) 

3 P 2 

p 4 + 2 p 2 

p 4 + 2 

3 P 2 

- (P 3 + 2p) 

- (p 3 + 2p) 

p 4 + 2p 2 

3 P 2 

3 P 2 

p 4 + 2 

- (P 3 + 2p) 

- (P 3 + 2p) 

- (p 3 + 2p) 

- (P 3 + 2p) 

- (P 3 + 2p) 

- (P 3 + 2p) 

oPV+l 

2 p 4 +i 

0 p 6 +2p 2 

2 P 4 +l 

\- (p 3 + 2p) 

- (P 3 + 2p) 

- (P 3 + 2p) 

- (P 3 + 2p) 

nP 6 +2p 2 

2 ^+r 

oP +P +1 

2 P 4 +l / 


Substituting p n into (5.17) we obtain a \/n-consistent estimator of /(/(p), G, T*). 
In view of /(p) = (1,1,1,1, 2p, 2p) T we have 


f T (p)Hf(p)i G,V) — (l — p 2 ) (l + p 2 ,1 + p 2 , 1 + p 2 , 1 + p 2 , —2p, —2p) . 


Consequently the asymptotic lower bound for estimation of p within Q 
equals 


(5.18) 


i -1 


f(p) T I(f(p),G,V)f(p ) =H 1 -/° 2 ) 


Substituting p n for p we obtain as the efficient estimator from Theorem 4.1 


(5.19) 


Pn — Pn~\~ 


1 + Pn 

I "Pi 


{Pn, 1 


Pn) 


Pn 



1 -Pk 
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Hoff et al. (2014) have shown that the pseudo-likelihood estimator is not 
efficient in this case. Segers et al. (2014) have established the asymptotic 
lower bound (5.18) and have constructed an alternative, efficient, one-step 
updating estimator suggesting the pseudo-maximum likelihood estimator as 
the preliminary estimator. 

Example 5.3. Partial spline linear regression 

Here the observations are realizations of i.i.d. copies of the random vector 
X = (Y, Z T ,U T ) T with Y, Z, and U 1-dimensional, fc-dimensional, and p- 
dimensional random vectors with the structure 

(5.20) Y = 0 T Z + V’(H) + e, 

where the measurement error e is independent of Z and U, has mean 0, 
finite variance, and finite Fisher information for location, and where 'ip(-) is 
a real valued function on M p . Schick (1993) calls this partly linear additive 
regression, BKRW (1993) mention it as partial spline regression, whereas 
Cheng et al. (2015) are talking about the partial smoothing spline model. 
Under the regularity conditions of his Theorem 8.1 Schick (1993) presents 
an efficient estimator of 9 and a consistent estimator of 1(9,G,V). Con¬ 
sequently our Theorem 4.1 may be applied directly in order to obtain an 
efficient estimator of v in appropriate submodels with 8 = f{y) without our 
construction of an estimator of 1(8, G,V) via characteristic functions. Note 
that for submodels with 6 restricted to a linear subspace, 9 = Lv say, our 
approach is not needed, since the reparametrization Y = v T L T Z + iJj(U) + e 
brings the estimation problem back to its original (5.20). 

Example 5.4. Multivariate normal with common mean 

Let Q be the collection of nonsingular k x fc-covariance matrices 
the parametric starting model be the collection of nondegenerate 
distributions with mean vector 9 and covariance matrix E, 

(5.21) V = {P 0 , s : 8 € M fc , E € £?} . 

Efficient estimators of 8 and E are the sample mean X n = n -1 Xi and 
the sample covariance matrix E n = (n — l)” 1 (Xi — X n )(Xi — X n ) T , 
respectively. Note that X n attains the finite sample Cramer-Rao bound and 
the asymptotic information bound with I(9,T,,V ) = E -1 . 

The parametric submodel we consider is 


and let 
normal 


(5.22) 


Q — : /j£l, E G Q} ■ 
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In view of (4.17) and (3.11) 
(5-23) An= 


is an efficient estimator of y within Q that attains the asymptotic lower 


bound 


(ipr 1 !*) 


-i 


In case the covariance matrix T, is diagonal with 


its variances denoted by ...,c rjr, we are dealing with the Graybill-Deal 
model as presented by Van Eeden (2006) on her page 88. With X^ n = 
H £j=i x 3,ii S i,n = n ~ x i,n) 2 , and t n = diag (S? >n , Sl n ) we 

obtain the Graybill-Deal estimator 


(5.24) 




T.U i/s?„ 


with asymptotic lower bound 


i [5:- 1 r k 


) -1 = i/Efci i/ 


07 


Example 5.5. Restricted maximum likelihood estimator 

Maximum likelihood estimation of the generalized linear model under lin¬ 
ear restrictions on the parameters is done in Nyquist (1991) via an iterative 
procedure using a penalty function. Kim and Taylor (1995) introduce the 
restricted EM algorithm for maximum likelihood estimation under linear 
restrictions. Our approach as described in Remark 4.1 with 6 n a(n unre¬ 
stricted) maximum likelihood estimator avoids such iterative procedures. 


APPENDIX A: ADDITIONAL PROOFS 

In this appendix proofs will be presented of (4.21) and (4.22). 

Since I n has been chosen to be symmetric and positive definite, x T I n y , x, y € 
M fc , is an inner product on R k . Define the k x ^-matrices IIand Il^/j by 

n n,L = L^L T i n Ly [ L T i n , 

(at) n n>R = i~ l R ( R T i- l R)~ 1 r t . 

With the above inner product these matrices are projection matrices on 
the linear subspaces spanned by the columns of L and I” 1 /?, respectively. 
Indeed, II n,L^-n,L — n n ,Li n n /jll n — n n ^, (x II n ^s) 7 n II ri ^x — 

0, x € R k , (y - n n ,Ry) T irJi-n,Ry = 0, y € R k , II u ,lLx = Lx, x e R d , 
and II n,R,In 1 Ry = V € R k ~ d hold. The linear subspaces spanned by 

the columns of L and I” 1 /? have dimensions d and k — d, respectively, since 
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the matrices (L, R ) and I n are nonsingular. Moreover, these linear subspaces 
are orthogonal in view of L T Inlff 1 R = L T R = 0. This implies 

(A.2) YlnJ/X + n n :R X = X, x £ R k . 

Combining (A.l), (A.2), and (4.18) we obtain (4.21) and, by the consistency 
of L (4.22). 
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Computations needed for (5.11) and (5.16) are collected as supplementary 
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Supplementary Material For 
’’Semiparametrically Efficient Estimation 
of Constrained Euclidean Parameters” 


In this supplement we present the computational details for (5.11) and 
(5.16) presented in Example 5.2. Since our computations will be based on 
fourth moments of multivariate normal random variables, we consider 


(Za\ 

/ 

/°\ 

Z b 

~ N 

0 

Z c 

0 

\Zd) 

V 

w 


/ 1 

Pab 

Pac 

Pad ^ 

\ 

Pba 

1 

Pbc 

Pbd 


Pea 

Pcb 

1 

Pcd 


\Pda 

Pdb 

Pdc 

1 ) 

/ 


The following fourth moments of Z can be obtained by straightforward com¬ 
putations: 

• E{Zt) = 3 

• E ( Z a Z b) = 3p ab 

. E(Z 2 a Z 2 ) = 1 + 2 p 2 ab 

• E(Z a Z b Z c ) = p bc + 2 p a bPac 

• E (Z a Z b Z c Zd) — PabPcd T PacPbd T PadPbc- 

For every i,j = 1,..., (”) let be the element in the i-th row and j-th 
column of the efficient lower bound I^ 1 {6,G,T > ). Because of 6i = p a b, Oj = 
p c d for some a,b,c, and d, we have 

Mij = E ( Z a Z b — ^Pab [ Z a + Z b]) ( Z c Z d ~ \Pcd \ Z Z + Z d\) ■ 


We have three cases: 

• KM} n {c,d}| = 2 

Mu = E (Z a Z b - \p ab [Z 2 a + Z 2 ]) 2 

= E (ZZZ 2 ) - p ab E (. Z 3 a Z b + ZlZ a ) + \p 2 ab E (Z 4 a + 2 Z 2 a Z 2 + Z b 4 ) 
= (l + 2 p 2 ab ) — p ab (3 p ab + 3 p ab ) + \p 2 ab (3 + 2 [l + 2 pZ b ] + 3) 

= (1 -Plb? 


1 
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• |{a, 6} n {c, d}\ = 1 (without lost of generality assume d = a) 

Mij = E {Z a Z b — \p a b \Zq + Z 2 ]) ( Z a Z c — \p ac \Za + Z 2 ] ) 

= E {Z 2 a Z b Z c ) - \p ab E ( Z a 3 Z c + Z 2 b Z a Z c ) 

— \pacE ( Z\Z b + Z 2 Z a Zf) 

+ 1 PabPacE (4 + Z 2 Z 2 + Z 2 Z 2 + Z 2 Z 2 ) 

(Pbc “1“ 2 PabPac ) 2 Pab (3 Pac A [Pac “1“ ^PahPhf] ) 

2 Pac (ZPah “I - \Pab "F 2 PacPbc\) 

+ \PabPac (3 + [l + 2Pa b ] + [l + 2/3„ c ] + [l + 2/? 2 c ] ) 

= 2 _ Pa& — Pac) (2pfec — PabPac) + 2 PabPacPbc 

• |{a, 6} n {c, d}| = 0 

Mij = E (Z a Z b — [Z 2 + Z 2 ] ) {Z c Z d — \p cd [Z 2 + Zj] ) 

= E {Z a Z b Z c Zd ) — \PabE {Z 2 a z c z d + Z 2 Z c Z d ) 

— \pcdE [Z\Z a Z b + ZdZ a Z b ) 

+ip afe p cd E (z 2 z 2 + z 2 z 2 + z 2 z 2 + z 2 z 2 ) 

PabPcd “1“ PacPbd “1“ PadPbc 2 Pa/) ([Pcd “I - 2p a cPad\ E [Pcd “I - 2p b cPbd ]) 
2 Pcd ([Pab A 2p ac p b< ] -\- \p a b “I - 2PadPbd]) 

+iPa6Pcd ([l + 2p 2 c ] + [l + 2p 2 c ] + [l + 2p 2 d ] + [l + 2 p bd ]) 
PacPbd “1“ PadPbc (PafcPacPad "F PbaPbcPbd "F PcaPcbPcd “I - PdaPdbPdc) 
+ \PabPcd (Pac + Pbc + Pad + Pbd ) 


Finally, substitution of the correlation structures in Subexample 5.2.1 and 
Subexample 5.2.2 give (5.11) and (5.16), respectively. 


